Sharing data and work across queries in analytical workloads
نویسنده
چکیده
Traditionally, query execution engines in relational databases have followed a query-centric model: They optimize and execute each incoming query using a separate execution plan, independent of other concurrent queries. For workloads with low contention for resources, or workloads with short-lived queries, this model makes the optimization phase faster and creates efficient execution plans. For workloads with heavy contention, or workloads with long-running analytical queries, this model cannot exploit the sharing opportunities that might exist among concurrent queries in order to save I/O, CPU and RAM resources. We argue that exploiting these sharing opportunities is a crucial step towards handling these increasingly common workloads. In this paper, we study three research prototype systems that employ various methodologies for sharing data and work: (a) The QPipe query execution engine [1], which employs a circular scan per table and shares work through simultaneous pipelining, (b) the DataPath system [2], which employs an uninterrupted linear scan per disk and shares work through a global query plan, and (c) the SharedDB system [3], which employs a circular scan per table partition, shares work through a global query plan, uses batched execution, and services both OLTP and OLAP workloads under response time guarantees. We classify these methodologies, analyze their commonalities and differences, and identify their strengths and shortcomings.
منابع مشابه
Shared Execution of Recurring Workloads in MapReduce
With the increasing complexity of data-intensive MapReduce workloads, Hadoop must often accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated datasets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commo...
متن کاملSharing Data and Work Across Concurrent Analytical Queries
Today’s data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the queryc...
متن کاملImprovement of the Analytical Queries Response Time in Real-Time Data Warehouse using Materialized Views Concatenation
A real-time data warehouse is a collection of recent and hierarchical data that is used for managers’ decision-making by creating online analytical queries. The volume of data collected from data sources and entered into the real-time data warehouse is constantly increasing. Moreover, as the volume of input data to the real time data warehouse increases, the interference between online loading ...
متن کاملMiniTasking: Improving Cache Performance for Multiple Query Workloads
This paper proposes a novel idea, called MiniTasking to reduce the number of cache misses by improving the data temporal locality for multiple concurrent queries. Our idea is based on the observation that, in many workloads such as decision support systems (DSS), there is usually significant amount of data sharing among different concurrent queries. MiniTasking exploits such data sharing charac...
متن کاملMQJoin: Efficient Shared Execution of Main-Memory Joins
Database architectures typically process queries one-at-a-time, executing concurrent queries in independent execution contexts. Often, such a design leads to unpredictable performance and poor scalability. One approach to circumvent the problem is to take advantage of sharing opportunities across concurrently running queries. In this paper we propose Many-Query Join (MQJoin), a novel method for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012